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ABSTRACT 

It is suggested that much thought should be given to 
choosing an ^loropriate computer language for an institutional 
research office, considering the sophistication of the staff, types 
of planned application, size and type cf computer, and availability 
of central programming support in the institution. For offices that 
prepare straight reports and inferential statistics a statistical 
language that provides report features is recommended; straight 
report language is suggested for offices not doing inferential 
statistics. Offices doing their own programming, in part, should keep 
in Hind that; (1) programs producing reports should be clearly 
documented in English: <2) there should be no more than two languages 
used in an office, and no more than one person should knov each 
language: (3) new staff members should learn the languages used, 
rather than introduce a new language the new staffer knows: and (U) 
existing official data files should be used whenever possible rather 
than creating separate new ones. (MSE) 
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A strongly felt need for a multipurpose and simple-to-use 
computer language emei^ges as more and more institutional 
research offices acquire computer terminal A veritable Babel 
of computer languages exists (Inland, 1979), and a good deal of 
thought should be given ^o choosing the right language. Leani> 
ing a computer language represents a significant investment in 
time and effort, and because of the problem of training new staff 
when turnover occurs, only one or, at most, two languages 
should be introduced into an office. 

The state of the art in languages has progiessed far enough 
that, in most instances, institutional researchers need not be- 
come full-fledged computer programmers in order to access 
computer data. Many of the high-level languages hsve taken the 
drudgery out of programming, and having the computer do much 
of the data manipulation takes the drudgery out of institutional 
research. 

The choice of the best computer language depends on the 
sophistication of the siaff, the types of planned application, the 
size and type of computer, and the availability of central pro- 
gramming support. 

Itaiaing Coo&idemtioiis 

The characteristics that make learning and using a language 
easier for the institutional researcher who has little computer 
background arc these: syntax that looks like ordinary language; a 
weil-wrinen manual; availability of training programs; a lan- 
guage that is somewhat foi^i ving (that is, one that does not come 
to a dead stop for every little infraction of the rules); a minimum 
of required coding; clear, understandable error messages; and 
job control language grouped at one end or the other of a 
multistep job. On-campus assistance from someone who knows 
the language is essential for the neophyte, especially in learning 
the system-specific job control language for the institution's 
tnstaUatioo; a good place to start is with an undetgr^uate 
computer course in any language or a beginner's program 
offered by the computer center. Certain fundamental concepts 
are common to all languages. These include file i»tructure and 
input mediums, flow charting or other planning techniques, 
buiguage coding, and job control language. Once these are 
teamed, it is possible to gain a useful amount of skill in almost 
any of the report or statistical languages with a manual, rela- 
tively unimpcNded access to a computer, and a lot of patience. Of 
course, a training program helps, and access to someone who 
knows the language is even better. 

Applicatioiis 

The types of projects done by the mstitutional reseaich 
office should be considered next. There are two basic classes of 
computer applications in institutional research. The first and 
nK>st rommoo is the sort-accumulate-list type of report. This 
includes such things as count of students by race, by sex, by 
classification; ^aggregation of student credit hours by discipline, 
by level; cost figures by department; and government equal 
opportunity reports^ among many others^ The second type of 
application involves descriptive and inferential statistical 
analysis at varying levels of sophistication. This includes, for 
example, emotlment projection, analysis of qiie s ri onn ai r es, sal- 
aiy regressica analysis, and analysis of predictors of student 
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success. An efficient language will do both sets of applications. 

Both types of application require file handling, and the 
file-handling ability of the language chosen is of utmost impor- 
tance Institutional research offices use data from many sources 
and ruely can control file formats. White the easiest files to use 
arc fixed length sequential, more often than not the nature of 
institutional research projects requires reading variable-length, 
COBOL-generated files or matching information from a number 
of different files. It is desirable to be able to exiract small sets of 
information from lai^e files in order to reduce run time and cost. 

There arc some automatic features which simplify report 
writing: opening and closing of files, reading and writing of 
records, movement of data from input to output, initializing 
variables, accumulating totals, sorting, and report formatting 
with overrides available. Clear and understandable selection 
logic and the ability to manipulate character data as easily as 
numeric data are also important. 

It is possible to get usable, if not pretty, results almost 
immediately with automatic features, providing both a mo^e 
rapid return on invested time and an iiKenttve to continue 
learning. Automatic features, however, also tend to remove 
control from the programmer, so overrides are necessary if 
anything complex or unus'jal is to be produced. 

Descriptive statistics such as percentages, frequency distri- 
butions, and means are, by far, the <nost common applications of 
statistical analysis; some of the nonstatistical packages offer 
these features. The next most frequently used statistics are 
regression atid analysis of variance; one of the statistical pack- 
ages will have to be selected for these uses. It should be noted 
that some of the statistical packages are beginning to offer both 
statistics and report features. 



Size and lype of Computer 

While some of the more common languages — for example, 
Rtrtran and COBOL — are machine independent, being avail- 
able on almost any brand of computer, others — like IBM*s 
PL/1 — have been developed for only one type. IBM and IBM- 
type machines such as Amdahl, cerf^n Itel, CDC, Magnuson, 
and Ifyad models, comprise much oiwe computer market. For 
this reason, languages designed for iBM computers tend to 
dominate. It should be noted that even the nominally stan- 
dardized languages, COBOL among them, are highly machine 
dependent since each manufacturer tends to mtroduce features 
which take advantage of that particular computer's unique 
capabilities. 

Siie is also a factor. The amount of computer memory 
available sets limits on the capabilities of the language and on the 
size of the data sets and number of variables a given job can 
handle. Some packages— SPSS and MarklV, for example- 
come in versions to fii progressively larger computers. Other 
languages, such as Basic, are designed to operate best on small 
computers. As distributed data processing using microcomput- 
ers becomes more prevalent, available memory becomes more 
of a limiting factor, at least for nK)des of operation which are 
independent of a large central mainframe. ^ 

Normally, an institutional research office will simply adopt 
a language that is already available locally However, if cir- 
cumstances arise in which an instihitional research office '*goes 



shopping"* for a report or statistical language, it would be wise to 
get advice from soiiXDone who is familiar with hardware. 

AvailabUity of Central Computing Support 

The motivation for acquiring an office terminal, in many 
cases, is the lack of suffictent resources in a central computing 
facility to provide all the production programming iKseded for 
reports, especially ad hoc reports. However, there are many 
levels of independence fixmi central support. 

If central support is nonexistent due either to lack of 
hardware capability or staff, the institutional research office 
might consider acquiring a micro- or minicomputer and hiring or 
training a full-time programmer-analyst who can create and 
document an internal data system. (The documentation is even 
more important than the system itself.) Another option is to tie 
into a network on atimesharc basis, but since the cost of trial and 
error can be truly awesome at the rales chaiged for computer 
time, the presence in the office of a computer professional is 
recommended. 

More commonly, the cenural computing facility can provide 
support for routine repetitive reports » but ad hoc reports or 
summaries in a slightly different order remain time-consuming 
hand jobs. The regular institutional research professionals, in 
this case, can often leam a report or statistical language well 
enough to increase dramatically the efficiency and timeliness of 
these reports and the productivity of the office. 

In the enviable case where the level of central support is 
high, the institutional research office might still want to use one 
of the statistical packages for more sophisticated analysis. In 
addition, if a database query language is available, tte institu- 
tional research staff can save time and misunderstandings by 
specifying reports directly. 

Types of Language 

A language for use by institutional researchers, in most 
cases, should allow people who are not programmers to produce 
reports or do analysis independent of central computer-center 
support and with a minimum of coding. 

Ryland (1979) gives a comprehensive description of the 
types of proprietary software available today. What follows is a 
somewhat in-depth examination of representative saniples of 
different types of language. Tkble I summarizes the author *s 
opinion concerning language features of COBOL. Fortran, 
MarklV. Easytrieve, SPSS and SAS. The ratings range from 
2" which means poor or hard to use, to 2,** powerful or 
easy to use. A "zero"* signifies that the function docs not exist or 
is not applicable to the language. 

Programmer Languages 

IWo of the oldest and most commonly used languages are 
COBOL and Fortran. These are machine independent languages 
which usually come with the laiger computers, and national 
standards exist for them. COBOL was developed for business 
applications and uses ordinary English for its command syntax. 
However, it tends to be excessively longwinded« and records 
must be completely described, even if only part of the record is 
being accessed by the program. It has no automatic features but 
is a highly flexible language. It gives the progranmicr total 
control over the data and the results, but it also requires the 
programmer to exercise that control at all times. 

C03OL is probably best mastered on the job, working with 
someone who already knows it. Manuals arc supplied by the 
computer manufacturers, ami their comprehensibUity varies 
from system to system; but COBOL, because of its complexity, 
is not die type of language one canleam from a manual. COBOL 
is taught almost anywhere computer courses are available. There 
are also numerous texts available as well as an instructional 
versioa of the language, from which the fundamentals can be 
teamed. 

Rtrtran is a scientific language.lt is the '^number cruncher'' 



par excellancc. It is, however, very poor at handling character 
data; under normal operation, the maximum number of letters a 
char^tcr variable can contain is four. Fortran* like COBOL, has 
no automatic functions but does permit a great deal of flexibility 
and control. It is symbol oriented rather than language oriented, 
as expected of a scientific language. For example, compare the 
following ^wo read statements: 

COBOL READ "infilc" FROM "FD-infi!e record * AT 

END GO TO "branch name." 
Fortran 10 READ ( "n " , 20. END "branch number ") 
Prefix. Anum, Credit 
20 FORM.^(A4,F3-0,4X,F4.1) 
COBOL provides a separate file description which ties the 
variable name to its field size, type, and location. Fortran 
supplies a READ statement which lists the variable names and a 
FORMAT statement which provides tne field size and type. 
Fortran has fewer required statements than COBOL, making it 
both more concise and somewhat easier to leam initially, but 
because of its symbolic structtire, it is more complex to follow. 
COBOL is somewhat self documenting: variable names can be 
up to 40 characters long, so if descriptive names arc used, it is 
fairly easy to follow the program logic without having separate 
explanatory statements. Fortran, in contrast, must be completely 
document^ or programs become incomprehensible within a 
short time, even to their creators. Courses in Fortran are proba- 
bly even more readily available than courses in COBOL, and a 
teaching version exists also. 
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If Fortran and COBOL are the only two languages avail- 
abte, CX)BOL is pix^bably the best choice for institutional 
research, primarily because it handles character data and report 
writing better than Fortran docs. 

Several prograjioming aids have been developed recently 
which are reported to nuake COBOL coding much more effi- 
cient. COBOL, then, would be a fairly good choice for institu- 
tional research reports if these aids are available to the institu- 
tional researcher 

IBM's PL/I is reputed to have many of the advantages of both 
COBOL and Fortran, being language oriented but still being 
good for scientific calculations and having report capabilities 
also. 

Report Languages 

MarklV — a file maintenance and report language — is a 
proprietary language created by Informatics. Inc. It comes in 
three progressively laiger versions which operate on various 
models of IBM-type, Univac, and Siemens computers (In- 
formatics, Inc., 1977). It produces simple reports very easily, but 
even moderately complex applications are quite difficult. Tne 
natural environment for MarklV is one in which there is a high 
level of central support to create and maintain the catalog of files , 
to supply the complex job control language, and to guide the 
casual user through the somewhat obtuse symbolic syntax and 
selection logic. 

The user's manual for MarklV is in four volumes and 
assumes a high level of sophistication. Informatics has released 
an index which may alleviate some of the problems associated 
with locating information in the manual (Informatics, Inc., 
1979). The language is well supported with training programs 
which use well-prepared lesson materials, all geared to business 
applications, not to higher education. There is a free-form 
special feature which takes more or less freely keyed information 
and pre-edits it for input to the regular MarkJV processor. This 
adds an additional complication to a language which is already 
complicated enough. 

MaiklV handles files very well, especially those it creates, 
but difficulties have been experienced by programmers using 
files not created by MarklV if key fields contain invalid data or if 
the file is out of sequence. The language is designed to take 
advantage of the space savings permitted by variable fields. 

MarklV lends itrelf to production applications which allow 
users a certain amount of flexibility in specifying the contents 
and order of a report. MarklV permits users to specify reports 
without having to concern themselves with the details of file 
handling and description when a high level of central support for 
file and catalog maintenance is available. It is capable of 
pfx>viding — quite economically — numerous reports simultane- 
ously from the same file systems. It is not as economical to run 
MarklV reports one at a time. 

It is worthwhile for an institutional research office at an 
institution which already supports MarklV to take advantage of 
the report- writing flexibility it provides. It is too time consum- 
ing, however, for the office to set up and maintain its own 
MarklV library. This is probably true also of any of the database 
or quasi-database langtiages currently available. 

Easytrieve, in contrast, requires little or no central support. 
This is a proprietary report language from Pansophic, Inc. which 
operates only on a large computer such as an IBM 360, 370, or 
an equivalent. It can be learned from the manual, is language 
oriented* and its free input form makes for easy terminal usc> 
Easytrieve handles data in any form— numeric, chai^ter, bi- 
nary, packed, or other. Only the fields actually required for the 
program need be defined to Easytrieve. It permits cataloging of 
file descriptions for future use, but it does not require such a 
catalog. Both of these features, plus automatic reading, writing, 
sorting* and totaling, help keep coding to a minimum. 

Easytrieve *s file extraction ability is somewhat awkward 
and» while it does read variable records, getting it to sort on a 
field fiom the variable portion of a record is a complicated task . 

ERLC 



Its file handling strengths are its file matching and table search 
capabilities. It also automatically writes out the complete input 
record without the programmer having to specify output format, 
a feature which amounts to totally automatic movement of 
modified input to output data, a real advantage for file updating. 
Easytrieve automatically totals numeric datu, writes the totals in 
the same column as the detail information, and, in addition, 
pennits computations to be perfonned on these control totals. It 
has a full set of automatic report format functions with fair ability 
to override. Its main report limitation is chat it won't normally 
**go around corners ** md continue output on the next line. When 
the print line overflows, it comes to a halt until columns are 
closed with a space override or a field is removed. It can be 
instructed to write multiple-line reports, but the data then must 
be all character. The current version of Easytrieve requires job 
control language (JCL) between each job step in a multi-step 
program, but a new version has just been released which allows 
numerous job steps to be strung together without inserting JCL at 
every step. Easytrieve is an excellent sort-accumulalc-list report 
writer, and it makes economical use of computer time. 

None of the languages discussed has extensive statistical 
features. Easytrieve does sums automatically, MarklV does 
sums and averages and provides minimums and maximums, but 
none of them does inferential statistics automatically. One must 
turn to one of the statistical packages for these features. 

Statistical Languages 

The Sutistical Package for the Social Sciences (SPSS), 
from SPSS, Inc., is perhaps the best known of the statistical 
l;inguagcs. It is relatively machine independent and will operate 
on the IBM 360, 370, or laiger machines (or equivalent) or on 
the CDC6000, CYBER 70, Univac 1 100, Xerox, and Burroughs 
B6000 or B7000 (Nie, Hull, Jenkins, Steinbrcnner, and Bent, 
1975). It is based on the Fortran language and shares Fortan*s 
weakness in dealing with character data, although recent modifi- 
cations have done much to alleviate this problem. The new SPSS 
report feature (Hull, 1979) allows concatenation of character 
fields so that names can be listed property. The new report 
feature also provides many desirable report format features 
including automatic totals, means, and other descriptive statis- 
tics, and in addition, it allows calculations with the summary 
variables. The SPSS manual is one of the best written manuals 
available, explaining clearly how to use the language and 
including adequate examples. It is also a fairly good statistical 
reference work. SPSS offers a complete collection of inferential 
statistics including various types of correlation and regression, 
analysis of variance, discriminant analysis, factor analysis, 
canonical correlation, and in the latest version, a full array of 
nonparametric statistics. 

This new version also offers expanded data-handling 
capabilities, reading packed data, zoned decimal data, and 
double precision numeric data. A major file-handling weakness 
of SPSS, for institutional research, is that it reads only fixed 
length files. However, it does read multiple files, meige data, 
and add variables as well as create extract tiles and write out 
statistically derived variables such as correlation matrices. It 
also permits cataloging of SPSS data sets and procedures for 
futtuic use. 

SPSS, like Foitran, is a symbolically oriented language, 
and the record descriptions and assignment of variable names 
occur normally in two different program statements. The fixed 
format source code of SPSS is not as easy a fonn for terminal use 
as is a free format input. SPSS has just recently become a much 
more useful language for institutional research with the introduc- 
tion of its new report writerand extended data capabilities. It will 
not. however, process data on variable length files. 

Another, newer^ statistical language is the Statistical 
Analysis System (SAS), marketed by SAS Institute, Inc. SAS is 
only available for the IBM 360 or 370 or laiger machines or 
equivalents. There are a number of manuals for SAS, from the 
primer for beginners (Helwig, 1978) to the programmer's man- 
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the t\p:^ (Hclwig & Reinhardt, 1979). U somcuincs 
s<^ms ttut the n o^ i ifexible t language is. the more complex it is 
to team, aod aitbougb SAS can be leamcd from ti^ manual, it 
does take time to tie the pieces together. Unlike SPSS, SAS gives 
very little detail about the statistical priticiples behind its proce,^ 
dures, but it does provide references to texts from which the 
algorithms uere taken. 

SAS is excellent for fi!e manipulation. It reads both fixed 
and variabte length files and c'oes extraction of data fairly well . It 
[xocesses all types of dat2 format asKl has a number of different 
ways CO specify file description. It is suggested, however, that a 
user settle on arte style, to avoid confusion. SAS will read 
multiple 6tes in a single job and will sort, match, merge, add 
new variables, update, and subset very easily. An automatic 
table look-up is about the only feature it lacks. 

SAS has a completely automatic report fe^urc, PROC 
PRINT, which automatically compresses data columns as far as 
possible and starts a second print line if data still overflows a 
single line. There is a second report option which permits 
complete specification of the report format using PUT state- 
ments, o 

SAS accuses the IBM operating system and provides 
central processing unit (CPU) time data for poKedures as well as 
record counts and label information for data sets. The language 
has a fairly complete array of descriptive and inferential statis- 
tics. It has fewer nonpajfametnc functions than the new SPSS but 
has some parametric options which SPSS lacks, especially for 
exploratory regression analysis. In contrast, SPSS sIk>ws how to 
handle dummy variable coding for regression, while SAS does 
not. Both languages allow interfaces with other statistical pack- 
ages. SAS will interface with BMDF. OSIRIS. SPSS, and 
DAIA^TEXT aiKi will use statistical procedures from those 
packages instead of using its own, while retaining its own data 
structure; SPSS interfaces with OSIRIS. 

Tiie SAS Institute sponsors an active user organization 
which holds conferences and workshops and which publishes a 
newsletter called '^SAS Communications/' Users are given an 
opportunity, in an annual survey, to vote on the functions they 
want developed in new versions. 

SAS is probably the better of the two statistical languages, 
for instinitional research, because of its file-handling ability and 
its free-form, language-oriented syntax. However, its utility is 
U)mewhat limited since it is available only on large, IBM-type 
computers. 

ConciuskMi 

A statistical langtiage which provides report features is 
probably the best language choice for an institutional research 
office which does both straight reports and inferential statistics. 
SAS, if it is available, is probably the more useful of the two 
described in this paper, mainly because of the ease with which it 
manipulates all types of files and its free-form, language- 
oriented coding fonnaf which lends itself to terminal operation. 

If the office does not get involved in inferential statistics. 
Easytrieve or an equivalent, straight report language is prefera- 
ble to either the file management languages, such as MarklV. or 
the COBOL-Foitan-PL/l'type of programmer language. 

An institutional research office which does some of its own 
programming should keep in mind the following: 

• Programs that produce reports should be documented 
cleariy in English. When the staff member who wrote the 
program leaves, it is almost impossible for another staff 
member to determine from the program code exactly 
what went into the report. 

• There should be only one or two languages used in the 
office, and more than one staff member should knov.' 
each one used. 

• A new staff member, who akeady knows another lan- 
guage, should learn the ""official language or languages 
rather than introducing a new one. Haining time may be 
increased, but it will be well worth the extra effort to 
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keep institutional research programs consistent and reus- 
able under changing circumstances. 
• Existing official data files should be used whenever 
possible, rather than separate ones being created or 
maintained. A "freeze" schedule may have to be ar- 
ranged to pick off data for storage from Hve data files, but 
the temptation to create '^your " data should be avoided; 
**your*' data can become inconsistent with "their" data 
very quickly If frozen files are used, the whole file in its 
original format should be obtained to avoid having to 
rewrite the data access program when changes to the file 
are made. 

Learning a programming language can be a time- 
consuming and sometimes frustrating task, but once the lan- 
guage becomes familiar, it extends the capability of the institu- 
tional researcher to provide meaningful analysis by removing 
much of the burden of routine clerical detail. It is possible to see 
the fores! when it is no longer necessary to count by hand all the 
rings on all the trees. 
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